26 research outputs found

    Revisiting the Onsets and Frames Model with Additive Attention

    Get PDF
    Accepted in IJCNN 2021 Special Session S04. https://dr-costas.github.io/rlasmp2021-website/Accepted in IJCNN 2021 Special Session S04. https://dr-costas.github.io/rlasmp2021-website/Recent advances in automatic music transcription (AMT) have achieved highly accurate polyphonic piano transcription results by incorporating onset and offset detection. The existing literature, however, focuses mainly on the leverage of deep and complex models to achieve state-of-the-art (SOTA) accuracy, without understanding model behaviour. In this paper, we conduct a comprehensive examination of the Onsets-and-Frames AMT model, and pinpoint the essential components contributing to a strong AMT performance. This is achieved through exploitation of a modified additive attention mechanism. The experimental results suggest that the attention mechanism beyond a moderate temporal context does not benefit the model, and that rule-based post-processing is largely responsible for the SOTA performance. We also demonstrate that the onsets are the most significant attentive feature regardless of model complexity. The findings encourage AMT research to weigh more on both a robust onset detector and an effective post-processor

    The effect of spectrogram reconstructions on automatic music transcription: an alternative approach to improve transcription accuracy

    Get PDF
    Most of the state-of-the-art automatic music transcription (AMT) models break down the main transcription task into sub-tasks such as onset prediction and offset prediction and train them with onset and offset labels. These predictions are then concatenated together and used as the input to train another model with the pitch labels to obtain the final transcription. We attempt to use only the pitch labels (together with spectrogram reconstruction loss) and explore how far this model can go without introducing supervised sub-tasks. In this paper, we do not aim at achieving state-of-the-art transcription accuracy, instead, we explore the effect that spectrogram reconstruction has on our AMT model. Our proposed model consists of two U-nets: the first U-net transcribes the spectrogram into a posteriorgram, and a second U-net transforms the posteriorgram back into a spectrogram. A reconstruction loss is applied between the original spectrogram and the reconstructed spectrogram to constrain the second U-net to focus only on reconstruction. We train our model on three different datasets: MAPS, MAESTRO, and MusicNet. Our experiments show that adding the reconstruction loss can generally improve the note-level transcription accuracy when compared to the same model without the reconstruction part. Moreover, it can also boost the frame-level precision to be higher than the state-of-the-art models. The feature maps learned by our U-net contain gridlike structures (not present in the baseline model) which implies that with the presence of the reconstruction loss, the model is probably trying to count along both the time and frequency axis, resulting in a higher note-level transcription accuracy

    Revisiting the onsets and frames model with additive attention

    Get PDF
    Recent advances in automatic music transcription (AMT) have achieved highly accurate polyphonic piano transcription results by incorporating onset and offset detection. The existing literature, however, focuses mainly on the leverage of deep and complex models to achieve state-of-the-art (SOTA) accuracy, without understanding model behaviour. In this paper, we conduct a comprehensive examination of the Onsets-and-Frames AMT model, and pinpoint the essential components contributing to a strong AMT performance. This is achieved through exploitation of a modified additive attention mechanism. The experimental results suggest that the attention mechanism beyond a moderate temporal context does not benefit the model, and that rule-based post-processing is largely responsible for the SOTA performance. We also demonstrate that the onsets are the most significant attentive feature regardless of model complexity. The findings encourage AMT research to weigh more on both a robust onset detector and an effective post-processor

    Towards a global partnership model in interprofessional education for cross-sector problem-solving

    Get PDF
    Objectives A partnership model in interprofessional education (IPE) is important in promoting a sense of global citizenship while preparing students for cross-sector problem-solving. However, the literature remains scant in providing useful guidance for the development of an IPE programme co-implemented by external partners. In this pioneering study, we describe the processes of forging global partnerships in co-implementing IPE and evaluate the programme in light of the preliminary data available. Methods This study is generally quantitative. We collected data from a total of 747 health and social care students from four higher education institutions. We utilized a descriptive narrative format and a quantitative design to present our experiences of running IPE with external partners and performed independent t-tests and analysis of variance to examine pretest and posttest mean differences in students’ data. Results We identified factors in establishing a cross-institutional IPE programme. These factors include complementarity of expertise, mutual benefits, internet connectivity, interactivity of design, and time difference. We found significant pretest–posttest differences in students’ readiness for interprofessional learning (teamwork and collaboration, positive professional identity, roles, and responsibilities). We also found a significant decrease in students’ social interaction anxiety after the IPE simulation. Conclusions The narrative of our experiences described in this manuscript could be considered by higher education institutions seeking to forge meaningful external partnerships in their effort to establish interprofessional global health education
    corecore